The Global Water Access Gap
Introduction
This blog analyzes the impact of differential access to drinking water around the world. As water is critical to our survival, water access affects nearly all parts of human life – including life expectancy, socioeconomic status, health and nutrition, and much more. With endless issues to study surrounding water access, we focus our blog on a few key interests of ours, including how water access levels relate to educational outcomes and economic development levels, as well as how social media are used as a tool for clean water advocacy.
We begin this blog with an overview of how drinking water levels and deaths due to unsafe drinking water vary at the global scale. We then analyze the differences in water access across various levels of economic development, and across urban and rural regions. Next, we look at water access in schools, and its effect on primary and secondary school enrollment rates. Finally, in keeping with our shared passion for social justice and equity, we present an analysis of the most common sentiments and words used in tweets regarding clean water advocacy.
Photo: Lys Arango for Action Against Hunger, Philippines
Data
Dataset 1: Drinking Water Access Worldwide by Households
Description: This dataset is from the World Health Organization (WHO), accessible through the JMP Global Database. The dataset contains household-level data on the coverage and service levels of water throughout the world. The JMP Database allows users to filter by residence type (urban, rural, and total) and year (2000 to 2017), as well as look at the data by country or by SDG (Sustainable Development Goal) regions. We were able to download this data in the format of a .csv file.
The water service level within this dataset is divided into five possible categories. In order of lowest to highest levels of access, they include Surface Water, Unimproved Service, Limited Service, Basic Service, and Safely Managed Service.
Dataset 2: Deaths Caused by Unsafe Drinking Water
Description: This dataset is from the Global Health Data Exchange (GHDx), accessed through the Institute for Health Metrics and Evaluation at the University of Washington. The data includes the percentage of a country’s deaths that are caused by unsafe drinking water, as well as the number of deaths caused by unsafe drinking water from 1990 through 2019. We were able to directly download a .csv file for data on all countries for the year 2017 (selected to match the most recent year of available data in Dataset 1).
Dataset 3: Drinking Water Access Worldwide in Schools
Description: This dataset is from the same source as Dataset 1 (WHO) and was also found through the JMP Global Database. However, rather than household-level data, this dataset focuses on how water services levels vary in schools across the world. The dataset allows users to view country-level data, as well as data by other relevant groupings including the SDG (Sustainable Development Goal) regions, which we choose to focus on, given the importance of our topic in regards to the SDGs.
A key difference of importance between this dataset and Dataset 1 is that the water service levels are slightly different between the two. While the household-level dataset breaks water access down into 5 levels, this dataset only includes 3, likely due to data collection constraints. The 3 groupings (in order of lowest to highest levels of water service in schools) are No Service, Limited Service, and Basic Service.
Dataset 4: Primary and Secondary School Enrollment Rates
Link: https://ourworldindata.org/primary-and-secondary-education#enrolment-in-primary-school
Description: This website presents various statistics on school attendance, completion, and enrollment around the world, using data from the World Bank. We utilized 2 datasets from this site on school enrollment rates by country, one with primary school data and one with secondary school data. The data were measured at a variety of years for each country, and we kept the most recent year of data collection for each country in our analysis.
It is important to note that these data are reported as gross enrollment ratios, meaning the proportion of individuals enrolled in primary or secondary school over the total eligible population for each level of schooling. Therefore, it is common for some countries to have a gross enrollment ratio above 100%, as over-aged or under-aged students at each schooling level will not be accounted for in the eligible-aged population.
Dataset 5: Water Services and Economic Development
Description: We used the World Bank’s databank tool to create a dataset that contains GDP per capita ($), water service level, water service type, and gini coefficient of different countries across the world from 2000 to 2007. Since the distribution of GDP per capita is right skewed, we created a
loggdpvariable to better visualize the data.
Other Datasets:
For the text analysis of tweets, Masahiro created a unique dataset using the search_tweets() function through his Twitter developer account. The tweets addressed in the dataset are those generated in the period starting from April 27th and ending on May 7th this year. The tweets are selected if they include one of the following phrases: “water access,” or “Water access,” or “Water Access,” or “access to clean water,” or “access to drinking water.”
In addition to our main informational datasets, we also used the
mapspackage for spatial visualizations and the United Nation’s SDG regional grouping dataset (https://unstats.un.org/sdgs/indicators/regional-groups) to complement the data in Dataset 5.
Worldwide Water Service Access and Deaths Caused by Unsafe Drinking Water (Alastair)
Percentage of Deaths and Water Service Levels by Country (Map)
Limitations
Countries without data on water service access level and deaths caused by unsafe drinking water include: Taiwan, Argentina, Dominica, Palestine, Eritrea, Central African Republic, and Saint Kitts and Nevis.
The two countries not mapped are: Tokelau and Tuvalu. They are not included in leaflet’s world map, although there is data about their access to different water service levels as well as data on deaths related to unsafe drinking water for these two countries.
The year 2017 was the most recent year included in the water access dataset, so we are working under the assumption that those conditions are similar enough to the conditions in 2021 to draw meaningful analysis about the current global water access gap.
Conclusions
-Countries in Central Africa and South/Southeast Asia appear to have the highest percentage of deaths caused by unsafe drinking water in 2017 -Although countries like Chad, Nigeria, and Madagascar all have some of the highest percentage of deaths, the number of deaths in India is most for any country, with over 500,000 deaths caused by unsafe drinking water in 2017 -Interestingly, even counties with 100% safely managed service may still have some deaths, such as New Zealand which had 14 deaths caused by unsafe drinking water in 2017 -There is a high correlation between countries that have few deaths and countries that have a high percentage of their population relying on at least basic service for drinking water
Water Service Access, Economic Development, and Inequality (Siyi)
How do countries of different economic development levels differ in their access to water services and how has that changed over time?
Click here to view the interactive Shiny app.
Use of Data
This Shiny app focuses on two variables as indicators for economic development — GDP per capita ($) and Gini coefficient. Despite its limitations, GDP per capita, which is the economic output per person in a country, is generally regarded as an effective indicator of economic development levels. Gini coefficient measures the income inequality of a country, with a value of 0 representing perfect equality and 1 representing maximal inequality.
Trends
- Countries with higher GDP per capita generally tend to have higher coverage for at least basic and safely managed water service in terms of both drinking water and sanitation from 2000 to 2017.
- Countries with high economic inequality (large gini coefficient) tend to have medium-level water service coverage, while countries that have low income disparity tend to either have very high water service coverage and or lag behind in water service coverage.
- Europe and Northern America countries in general have higher GDP per capita, smaller gini coefficient, and better water service coverage than other countries.
- Central and Southern Asia and Sub-Saharan Africa countries tend to have lower GDP per capita and lag behind other countries in water service coverage.
- There is a general growth in water service coverage around the world from 2000 to 2017.
Conclusions
Add conclusion.
How does access to water services differ across urban and rural regions around the world and how has that changed over time?
Click here to view the interactive Shiny app.
Trends
World - In terms of both drinking water and sanitation, there is an obvious difference in water service coverage across urban and rural areas; urban areas tend to have higher coverage of safely managed and basic services than rural areas. - There is a growth in at least basic water service coverage for both urban and rural areas between 2000 to 2007, and the gap between them is gradually decreasing.
SDG Regions - The gap between urban and rural areas in different SDG regions, generally, is decreasing from 2000 to 2017. However, there is a lack of data for regional water service coverage by residence type in general, so many regions do not have data for either urban or rural service coverage or both and it is hard to tell what the temporal change looks like. - Across different regions and time, urban areas tend to have a higher water service coverage. - There are significant inequalities across regions - for instance, safely managed drinking water service coverage of rural Europe and North American is more than twice that of urban Sub-Saharan Africa in 2000. - The scale of urban-rural disparity in water service coverage differs across service types, service levels, time, and regions. For instance, in 2000, the urban-rural difference in safely managed drinking water service coverage was larger in some regions, such as Central and Southern America and Sub-Saharan Africa, than others, such as Europe and Northern America. However, in the same year, Sub-Saharan Africa’s urban-rural difference in safely managed service in sanitation is much smaller than that of Europe and Northern America.
Limitations
Add limitations.
Conclusions
Add conclusion.
Water Access in Schools (Jamie)
Background
Add more here. Click here to view the interactive Shiny app.
Impact on Enrollment
Add more here. Click here to view the interactive Shiny app.
Limitations
Add limitations.
Conclusions
Add conclusion.
Advocacies about water access around the world (Masahiro)
Introduction
In this tab, we take a look at the the tweets advocating for greater access to water around the world in order to discover some interesting trends among those tweets. Specifically, in order to gather up the tweets, search_tweets() function was run on May 4th and May 7th, and the tweets generated roughly from April 27th to May 7th were recorded in the same dataset. The included tweets all include at least one of the following phrases: “water access,” or “Water access,” or “Water Access,” or “access to clean water,” or “access to drinking water.” For more details, take a look at the “Wrangling - Masahiro” file in the same repo. Through exploring the following three questions with data, we aim to learn about what kind of rhetoric people are employing in an attempt to claim for more access to water around the world.
- What are the common words used in the tweets requesting more access to clean water around the world?
- What are the common sentiments of the words observed in those tweets?
- What do those common words and sentiments imply about people’s rhetorics arguing for clean water in some of the regions lacking water access?
In addition to the removal of so-called stop words from the dataset, we also omitted the word “access” because it is obvious that all the tweets should include that word from the way we collected data. Doing so helps us produce more meaningful word clouds and sentiment analyses.
Word Cloud
First, we examine the word cloud addressing all the words except the ones displaced through data wrangling in order to get a sense about what are some of the most common words utilized in the focal tweets.
In the above word cloud, “https” stands out in its size, which implies that a lot of tweets related to water access advocacy refer to or cite other web resources. Also, “clean” is displayed largely in the visualization, which should be partly because “access to clean water” is one of the phrases we actively searched for when scraping tweets. However, given that we also looked for “access to drinking water” when gathering text while the word “drinking” does not have equally big size in the display as “clean,” it seems like that the word “clean” possesses a particularly great importance for arguments for greater water access across the earth. Paying attention to other words displayed with smaller sizes, it can be seen that the cloud includes a lot of words related to potential use of water or implication of access to water: “sanitation,” “healthcare,” “health,” “food,” and “hygiene.” Besides, one of the interesting words to be observed in the cloud is “india,” whose presence may be attributable to the socioeconomic standing of India as a country or the nation’s especially large population. Finally, we also found it intriguing that “covid” occupied its place in the above visialuzation because it suggested that tweets about water access were often associated with this pandemic, although there did not seem to be a lot of explicit or obvious connections between the infectious disease and water access.
Sentiment Analysis
Next, we dive into the sentiments reflected in the usage of English by those advocating for water access on twitter. We use the NRC lexicon for attaching sentimental implications for words observed in tweets, and visualize the common sentiment in the tweets with the following graph.
As can be seen, positive, trust, and joy are the most popular sentiments among the words included in the tweets. Negative follows those top three sentiments, and then, the least popular sentiments such as anger, anticipation, fear, and sadness occupy the subsequent places. With this bar chart, we verify that a lot of words employed in the analyzed tweets have some positive connotations, which not only refers to “positive” as a sentiment but also “trust” and “joy.” In order to learn more about the use of words detected as implying these sentiments, we have decided to utilize the comparison cloud (see the next tab).
Comparison Cloud
The below comparison cloud displays what words are commonly used in the text scraped from twitter while also having implications of “positive,” “trust,” or “joy.” Before diving into the detailed observations about the visualization itself, we lay out how the code below works. A comparison cloud enables users to accomplish two goals simultaneously: comparing the relative frequency of the use of certain words and classifying the most commonly used words into several categories based upon certain criteria. In order to craft a comparison cloud, however, it is necessary to transform the data into the form of matrix, whose column corresponds to certain categories (in this case, the sentiment) and whose row refers to each word by its name. In order to craft such a matrix, a lot of wrangling has been conducted to create a dataset whose row corresponds to words and column to each sentiment. If interested, analyze the commented code below.
# preliminary wranglings below
# first extract words with the connotations of interest
# tweet_sentiment = dataset used for sentiment analysis
pure_words <- tweets_sentiment %>%
filter(sentiment == "positive" | sentiment == "trust" |
sentiment == "joy") %>%
# then collapse the rows so that each word only occupies a single row
group_by(word) %>%
summarize()
# now prepare the dataset to be joined with the dataset about the count of
# each word with the three focal sentiments
pure_words_copied <- pure_words %>%
# let each word occupy three rows at the same time
slice(rep(1:n(), each = 3)) %>%
mutate(number = row_number()) %>%
# list up all the sentiments of interest
mutate(sentiment = case_when(number %% 3 == 1 ~ "positive",
number %% 3 == 2 ~ "trust",
number %% 3 == 0 ~ "joy")) %>%
select(word, sentiment)
# the below dataset is about the count of each word with the three connotations
# of interest
comparison_words_prep <- tweets_sentiment %>%
# extract those with the three sentiments of innterest
filter(sentiment == "positive" | sentiment == "trust" |
sentiment == "joy") %>%
# and count the frequency
group_by(word, sentiment) %>%
summarize(N = n())
comparison_words_prep_2 <- pure_words_copied %>%
# join the dataset with the data about the count (used for the bar)
left_join(comparison_words_prep, by = c("word", "sentiment")) %>%
# if some words do not imply certain sentiments, it will be reflected as
# N/A values, so turn it into 0
mutate(count = case_when(is.na(N) ~ 0,
TRUE ~ as.numeric(N))) %>%
select(word, sentiment, count)
# one last step to make each column refer to each sentiment
comparison_words_prep_3 <- comparison_words_prep_2 %>%
spread(key = sentiment, value = count)
# the below code translates the data frame into a matrix, and each row name of
# the matrix should correspond to the word
comparison_words <- comparison_words_prep_3 %>%
select(-word) %>%
as.matrix()
rownames(comparison_words) <- comparison_words_prep_3$word
# create the comparison cloud
colors1 <- c("#48F11F", "#1226D2", "#CB0A3E")
colors2 <- c("#CCFF99", "#7F88EF", "#EF7FCA")
comparison.cloud(comparison_words, max.words = 100,
random.order = FALSE,
colors = colors1,
title.colors = colors1,
title.bg.colors = colors2)
As was the case in the first word cloud analysis, in this comparison cloud, too, “clean” stands out in its frequency of use as shown by its large size in the cloud. However, as a category, words classified into positive have more presence in the analyzed tweets as shown by the previous tab of bar chart, which means that the frequency of use of “clean” is not so big that it can dominate the text analysis conducted here by its extraordinarily large presence. Taking a closer look at the visualization above, we have noticed that the above display includes a lot of words related to potential outcomes caused by the greater access to water around the world: “food,” “healthy,” “save,” “green,” “income,” “medical,” “safe,” “luxury,” and “survive.” This finding somewhat resonates the insights gained in the original word cloud because both of the visualizations exhibit a lot of words associated with various promising implications of the access to water. Also, the above comparison cloud has let us notice that the tweets of interest contain a number of words related to the process of ensuring water access to underprivileged people: “advocate,” “guarantee,” “partnership,” “supporting,” “improving,” “conservation,” and “providing.” This suggests that the description of the necessary steps to secure water access around the world has made the tweets advocating for water access include a lot of words related to positive connotations, such as positiveness, trust, or joy.
Discussion
Throughout the exploration of the general word cloud, a bar chart, and a comparison cloud, this research has revealed that the tweets requesting greater access to water across the world incorporated a lot of words which connoted positiveness, joy, and trust, and that they specifically include a lot of words related to the potential outcomes of of access to water, such as “sanitation” or “food.” We believe that this may plausibly be attributable to the fact that a lot of tweets of interest here describe and discuss how securing water access can improve the life of people in developing country or what such water access enables. This explanation sounds convincing to some extent given that the comparison cloud has shown many words which can be associated with the process of improving water access, such as “partnership” or “donation.”
In other words, this study has revealed that the tweets arguing for water access around the world do not engage with negative words, such as death or disease, as much as they do with words with positive sentiments: “positive,” “joy,” “trust.” This implies that the tweets for advocacy of water access may talk more about how greater water access can resolve problems in the world by, for example, improving the sanitation, food access, and safety in some areas, rather than about how lack of water causes diseases, deaths, conflicts, or other sufferings on the earth. We find this speculation fairly plausible given all the results above, and also we find it intriguing that people describe more of the positive aspects of securing clean water around the world and less of the negative consequences caused by lack of water in discussing water access around the world.
However, we also acknowledge that these findings generated with word clouds do have limitations. The word cloud, bar chart, and comparison cloud here are all generated after cutting the tweets into words. In other words, we are not really analyzing the sentences, which is to say that we are not strictly distinguishing between the two following phrases: “today’s effort for greater water access can improve sanitation around the world,” and “today’s effort for greater water access does not improve sanitation around the world.” The two phrases include almost identical set of words, and moreover, since the negative connotation of the latter text is almost entirely due to the word “not,” which would have been removed as a stop word at the beginning of the data wrangling, our data analysis is not capable of distinguishing the sentiments between the two above phrases. Our findings indeed raise some common words among the tweets of interest, point to positive, joy, and trust as common sentiments, and reach potential explanations about people’s rhetoric which also resonate with what visualizations exhibit here. In short, as a blog project, we are confident that the text analysis using tweets have given substantial new perspectives upon people’s discourses for greater water access around the world. However, we also believe that we definitely need to exploit more techniques of text analysis to generate more accurate and meaningful findings, and future research may not only analyze these tweets as a set of words but also see them as a collection of bigrams or larger unit of English words in order to build upon and expand the discovery here.
Conclusion
Bibliography
DataBank. The World Bank Group. 2021. https://databank.worldbank.org/home.aspx
Global Health Data Exchange. Institute for Health Metrics and Evaluation at the University of Washington. 2019. http://ghdx.healthdata.org/gbd-results-tool
Roser, M. and Ortiz-Ospina, E. (2013) “Primary and Secondary Education”. Our World In Data. https://ourworldindata.org/primary-and-secondary-education
SDG Indicators. United Nations. 2021. https://unstats.un.org/sdgs/indicators/regional-groups
Water Supply, Sanitation and Hygiene (WASH) Household Data. WHO/UNICEF Joint Monitoring Programme (JMP). 2017. https://washdata.org/data/household#!/
Water Supply, Sanitation and Hygiene (WASH) School Data. WHO/UNICEF Joint Monitoring Programme (JMP). 2019. https://washdata.org/data/school#!/